May 10, 2021

Project Outline

  1. Introduction
  2. Materials and methods
  3. Results and discussion

3.1 Exploratory data analysis 3.2 Modeling

  1. Conclusion

Introduction

Introduction: Dataset

COVID-19 World Vaccine Adverse Reactions

  • Data from the Vaccine Adverse Event Reporting System (VAERS) created by the Food and Drug Administration (FDA) and Centers for Disease Control and Prevention (CDC)
  • Contains 3 datasets:
    1. PATIENTS.CSV
    2. VACCINES.CSV
    3. SYMPTOMS.CSV
  • Datasets connected by patient IDs (VAERS_ID)

Introduction: Dataset

COVID-19 World Vaccine Adverse Reactions

PATIENTS.CSV: Contains information about the individuals that received the vaccines

## # A tibble: 34,121 x 35
##   VAERS_ID RECVDATE  STATE AGE_YRS CAGE_YR CAGE_MO SEX   RPT_DATE   SYMPTOM_TEXT
##   <chr>    <chr>     <chr>   <dbl>   <dbl>   <dbl> <chr> <date>     <chr>       
## 1 0916600  01/01/20… TX         33      33      NA F     NA         "Right side…
## 2 0916601  01/01/20… CA         73      73      NA F     NA         "Approximat…
## 3 0916602  01/01/20… WA         23      23      NA F     NA         "About 15 m…
## # … with 34,118 more rows, and 26 more variables: DIED <chr>, DATEDIED <chr>,
## #   L_THREAT <chr>, ER_VISIT <chr>, HOSPITAL <chr>, HOSPDAYS <dbl>,
## #   X_STAY <chr>, DISABLE <chr>, RECOVD <chr>, VAX_DATE <chr>,
## #   ONSET_DATE <chr>, NUMDAYS <dbl>, LAB_DATA <chr>, V_ADMINBY <chr>,
## #   V_FUNDBY <chr>, OTHER_MEDS <chr>, CUR_ILL <chr>, HISTORY <chr>,
## #   PRIOR_VAX <chr>, SPLTTYPE <chr>, FORM_VERS <dbl>, TODAYS_DATE <chr>,
## #   BIRTH_DEFECT <chr>, OFC_VISIT <chr>, ER_ED_VISIT <chr>, ALLERGIES <chr>

Introduction: Dataset

COVID-19 World Vaccine Adverse Reactions

VACCINES.CSV: Contains information about the received vaccine

## # A tibble: 34,630 x 8
##    VAERS_ID VAX_TYPE VAX_MANU         VAX_LOT VAX_DOSE_SERIES VAX_ROUTE VAX_SITE
##    <chr>    <chr>    <chr>            <chr>   <chr>           <chr>     <chr>   
##  1 0916600  COVID19  "MODERNA"        037K20A 1               IM        LA      
##  2 0916601  COVID19  "MODERNA"        025L20A 1               IM        RA      
##  3 0916602  COVID19  "PFIZER\\BIONTE… EL1284  1               IM        LA      
##  4 0916603  COVID19  "MODERNA"        unknown <NA>            <NA>      <NA>    
##  5 0916604  COVID19  "MODERNA"        <NA>    1               IM        LA      
##  6 0916606  COVID19  "MODERNA"        011J20A 1               IM        LA      
##  7 0916607  COVID19  "MODERNA"        <NA>    <NA>            IM        LA      
##  8 0916608  COVID19  "MODERNA"        <NA>    1               IM        LA      
##  9 0916609  COVID19  "MODERNA"        011J20… 1               IM        LA      
## 10 0916610  COVID19  "MODERNA"        <NA>    1               SYR       LA      
## # … with 34,620 more rows, and 1 more variable: VAX_NAME <chr>

Introduction: Dataset

COVID-19 World Vaccine Adverse Reactions

SYMPTOMS.CSV: Contains information about the symptoms experienced after vaccination

## # A tibble: 48,110 x 11
##   VAERS_ID SYMPTOM1     SYMPTOMVERSION1 SYMPTOM2     SYMPTOMVERSION2 SYMPTOM3   
##   <chr>    <chr>                  <dbl> <chr>                  <dbl> <chr>      
## 1 0916600  Dysphagia               23.1 Epiglottitis            23.1 <NA>       
## 2 0916601  Anxiety                 23.1 Dyspnoea                23.1 <NA>       
## 3 0916602  Chest disco…            23.1 Dysphagia               23.1 Pain in ex…
## 4 0916603  Dizziness               23.1 Fatigue                 23.1 Mobility d…
## 5 0916604  Injection s…            23.1 Injection s…            23.1 Injection …
## 6 0916606  Pharyngeal …            23.1 <NA>                    NA   <NA>       
## # … with 48,104 more rows, and 5 more variables: SYMPTOMVERSION3 <dbl>,
## #   SYMPTOM4 <chr>, SYMPTOMVERSION4 <dbl>, SYMPTOM5 <chr>,
## #   SYMPTOMVERSION5 <dbl>

Introduction: Aim

The aim of this project is to gain insight on the adverse effects of different Covid-19 vaccines and answer questions such as:

  • Do some vaccines cause more/different symptoms than others?

  • Do patients with some profiles get more/different symptoms?

  • Are certain symptoms correlated with death?

  • Is patient profile correlated with death?

  • Does taking anti-inflammatory drugs reduce the chance of having symptoms?

Methods

Methods: Project workflow

  1. Load data sets (patients, vaccines, symptoms)
  2. Clean each data set individually
  3. Augment and merge the data sets
  4. Exploratory data analysis
  • Visualizations
  • PCA
  1. Modelling
  • Logistic regressions
  • Proportion testing

Methods: Challenges and solutions

Cleaning

  • NAs that should be interpreted as “no” → replace_na(ALLERGIES = “N”)
  • Duplicated IDs → add_count(VAERS_ID) %>% filter(n == 1) %>% select(-n)

Augment

  • Columns containing long string descriptions → Make tidy categorical (Y/N) variables
## # A tibble: 3 x 3
##   VAERS_ID OTHER_MEDS                     TAKES_ANTIINFLAMATORY
##   <chr>    <chr>                          <chr>                
## 1 0916983  <NA>                           N                    
## 2 0916988  Ibuprofen  PM the night before Y                    
## 3 0916996  Clobetasol, Benadryl           N
  • Too many symptoms and untidy → extract top 20 occurring symptoms and turn them into tidy categorical (TRUE/FALSE) columns

Exploratory Data Analysis

Visualization

Visualization

Group representation: Age, sex and manufacturer

## # A tibble: 3 x 2
##   SEX       n
##   <chr> <int>
## 1 F     24070
## 2 M      8514
## 3 <NA>    828
## # A tibble: 3 x 2
##   VAX_MANU            n
##   <chr>           <int>
## 1 JANSSEN          1106
## 2 MODERNA         16253
## 3 PFIZER-BIONTECH 16053

Visualization

Days until onset of symptoms vs. Age Group

Hypothesis: two peaks corresponding to the innate and acquired immune response

Visualization

Age/sex vs. number of symptoms

Visualization

Vaccine manufacturer vs. number of symptoms

Visualization

Age vs. types of symptoms

Visualization

Sex vs. types of symptoms

Visualization

Vaccine manufacturer vs. types of symptoms

Exploratory Data Analysis

Principal Component Analysis

PCA

Important tools used

Important verbs and tools used:

  • prcomp()
  • augment ()

PCA

PCA plot and rotation matrix

PCA

Scree plot

Modelling

Logistic Regressions

Logistic Regression

Death ~ Patient Profile

Is the patient’s profile (sex, age, allergic/not, ill/not, has/had covid/not) correlated with death?

## # A tibble: 7 x 6
##   term           estimate std.error statistic  p.value odds_ratio
##   <chr>             <dbl>     <dbl>     <dbl>    <dbl>      <dbl>
## 1 (Intercept)    -9.39      0.161    -58.2    0         0.0000832
## 2 SEXM            0.929     0.0573    16.2    4.00e-59  2.53     
## 3 AGE_YRS         0.0914    0.00207   44.1    0         1.10     
## 4 HAS_ALLERGIESY -0.0204    0.0605    -0.338  7.35e- 1  0.980    
## 5 HAS_ILLNESSY    1.08      0.0654    16.4    8.86e-61  2.93     
## 6 HAS_COVIDY     -0.113     0.142     -0.794  4.27e- 1  0.893    
## 7 HAD_COVIDY     -0.00375   0.195     -0.0193 9.85e- 1  0.996

Logistic Regression

Death ~ Patient Profile

Is the patient’s profile (sex, age, allergic/not, ill/not, has/had covid/not) correlated with death?

Logistic Regression

Death ~ Symptoms

Are some symptoms correlated with death?

## # A tibble: 20 x 6
##   term          estimate std.error statistic  p.value odds_ratio
##   <chr>            <dbl>     <dbl>     <dbl>    <dbl>      <dbl>
## 1 (Intercept)     -2.01     0.0287    -70.1  0             0.134
## 2 HEADACHETRUE    -1.67     0.156     -10.7  7.92e-27      0.188
## 3 PYREXIATRUE     -0.429    0.112      -3.82 1.34e- 4      0.651
## 4 CHILLSTRUE      -1.21     0.171      -7.11 1.17e-12      0.298
## 5 FATIGUETRUE     -0.367    0.115      -3.19 1.41e- 3      0.693
## 6 PAINTRUE        -0.913    0.153      -5.98 2.17e- 9      0.401
## 7 NAUSEATRUE      -0.621    0.139      -4.46 8.17e- 6      0.538
## 8 DIZZINESSTRUE   -2.17     0.193     -11.2  2.87e-29      0.114
## # … with 12 more rows

Logistic Regression

Death ~ Symptoms

Are some symptoms correlated with death?

Many Logistic Regressions

Each Symptom ~ Takes Anti-Inflamatory

Does taking anti-inflamatories modify the chance of having symptoms?

## # A tibble: 20 x 9
##   SYMPTOM  estimate std.error statistic p.value conf.low conf.high odds_ratio
##   <chr>       <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>      <dbl>
## 1 HEADACHE  -0.164     0.0987    -1.67   0.0958   -0.362    0.0255      0.848
## 2 PYREXIA    0.0152    0.102      0.150  0.881    -0.189    0.211       1.02 
## 3 CHILLS    -0.121     0.109     -1.11   0.266    -0.340    0.0875      0.886
## 4 FATIGUE    0.0565    0.105      0.539  0.590    -0.154    0.258       1.06 
## 5 PAIN       0.0113    0.110      0.102  0.919    -0.210    0.222       1.01 
## # … with 15 more rows, and 1 more variable: identified_as <chr>

Many Logistic Regressions

Each Symptom ~ Takes Anti-Inflamatory

Modelling

Chi-squared Contingency Table Tests

Chi-squared Contingency Table Tests

Vaccine Manufacturer and Death

Null hypothesis: the proportion of patients that died after getting one vaccine = the proportion of patients that died after getting another vaccine

## # A tibble: 2 x 4
##   DIED  JANSSEN MODERNA `PFIZER-BIONTECH`
##   <chr>   <dbl>   <dbl>             <dbl>
## 1 N        1090   15281             15212
## 2 Y          16     972               841

Chi-squared Contingency Table Tests

Vaccine Manufacturer and Sex

Null hypothesis: the proportion of males that died after vaccination = the proportion of females that died after vaccination

## # A tibble: 2 x 3
##   DIED      F     M
##   <chr> <dbl> <dbl>
## 1 N     23271  7523
## 2 Y       799   991

Conclusion and discussion

References